-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #943: don't try to take 2 snapshots at a time #1081
Conversation
Interesting. This PR: is pretty close to the same thing. |
@@ -331,6 +331,8 @@ const ( | |||
) | |||
|
|||
func (s *RaftServer) ForceLogCompaction() error { | |||
s.mutex.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the lock, influxdb attempts to take multiple snapshots at the same time. If you comment those lines out, rebuild influxdb, and then run it & influxdb_stress (described above) at the same time...you'll see the messages in the log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, got it, I was thinking that might be it from the title of the PR. I guess I might have put the lock inside TakeSnapshot(). This is kind of a matter of taste though -- depends on one thinks about TakeSnapshot().
@otoolep it was related. There were actually two spots that needed to be reset to nil. |
Do you know why this |
@jvshahid no but I think it's a separate issue. |
Agree, I was just wondering why it's failing to save the state in the first place. I'll merge this in a bit. |
The
TakeSnapshot()
function in _vendor/raft/server.go was failing ats.stateMachine.Save()
and exiting without resettings.pendingShapshot = nil
. That left the raft server in an invalid state.s.stateMachine.Save()
was failing with the error message "gob: encodeReflectValue: nil element", which comes from theSave()
function in cluster/cluster_configuration.go. This is a separate issue.